Search results for " DNA sequences"

showing 3 items of 3 documents

Variable Ranking Feature Selection for the Identification of Nucleosome Related Sequences

2018

Several recent works have shown that K-mer sequence representation of a DNA sequence can be used for classification or identification of nucleosome positioning related sequences. This representation can be computationally expensive when k grows, making the complexity in spaces of exponential dimension. This issue effects significantly the classification task computed by a general machine learning algorithm used for the purpose of sequence classification. In this paper, we investigate the advantage offered by the so-called Variable Ranking Feature Selection method to select the most informative k − mers associated to a set of DNA sequences, for the final purpose of nucleosome/linker classifi…

0301 basic medicineSequenceSettore INF/01 - InformaticaEpigenomic030102 biochemistry & molecular biologybusiness.industryComputer scienceDeep learningPattern recognitionFeature selectionDNA sequencesNucleosomesRanking (information retrieval)Set (abstract data type)03 medical and health sciencesVariable (computer science)030104 developmental biologyDimension (vector space)Feature selectionDeep learning modelsArtificial intelligenceDeep learning models Feature selection DNA sequences Epigenomic NucleosomesRepresentation (mathematics)business

researchProduct

AnABlast: Re-searching for Protein-Coding Sequences in Genomic Regions

2019

AnABlast is a computational tool that highlights protein-coding regions within intergenic and intronic DNA sequences which escape detection by standard gene prediction algorithms. DNA sequences with small protein-coding genes or exons, complex intron-containing genes, or degenerated DNA fragments are efficiently targeted by AnABlast. Furthermore, this algorithm is particularly useful in detecting protein-coding sequences with nonsignificant homologs to sequences in databases. AnABlast can be executed online at http://www.bioinfocabd.upo.es/anablast/ .

Fossil DNA sequencesProtein coding0303 health sciencesGene predictionCoding DNA sequences030302 biochemistry & molecular biologyComputational biologyBiologyGene findingDNA sequencing03 medical and health sciencesExonchemistry.chemical_compoundIntergenic regionchemistryHomologous chromosomeSmall genesGeneIn silico annotation toolDNA030304 developmental biology

researchProduct

Normalised compression distance and evolutionary distance of genomic sequences: comparison of clustering results

2009

Genomic sequences are usually compared using evolutionary distance, a procedure that implies the alignment of the sequences. Alignment of long sequences is a time consuming procedure and the obtained dissimilarity results is not a metric. Recently, the normalised compression distance was introduced as a method to calculate the distance between two generic digital objects and it seems a suitable way to compare genomic strings. In this paper, the clustering and the non-linear mapping obtained using the evolutionary distance and the compression distance are compared, in order to understand if the two distances sets are similar.

Settore ING-INF/05 - Sistemi Di Elaborazione Delle Informazionibusiness.industryCompression (functional analysis)Metric (mathematics)Normalized compression distanceuniversal similarity metric USM clustering DNA sequences normalised compression distance evolutionary distance genomic sequences nonlinear mapping bioinformaticsPattern recognitionArtificial intelligenceCluster analysisbusinessDistance matrices in phylogenyMathematics

researchProduct